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This paper introduces several data analysis routines 
that were designed for interactive use with APL (A Program- 
ming Language) and placed in the APL user library at the 
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sity estimation and probability plotting routines are both 
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general user an extensive tool to analyze either discrete 
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I. INTRODUCTION 



The Naval Postgraduate School acquired APL (A program- 
ming Language) from IBM in 1974. Since that time more and 
more students and faculty have become familiar with the ex- 
tensive and efficient capabilities of APL and have been 
putting these features to good use. With the acquisition of 
APL came several extensive library routines that are both 
well documented and varied in scope. However, on close ex- 
amination of these library routines it was found that statis- 
tics and data analysis were areas where some additions would 
be parti cul arly useful . 

Because of the efficiency and ease of APL in manipulat- 
ing vectors, matrices and arrays, it is ideal for use in the 
area of data analysis. After a complete and thorough screen- 
ing of the existing APL library routines pertaining to 
data analysis, it was found that by adding six additional 
data analysis routines to the present library, the Naval Post- 
graduate School could enhance its present APL capability 
and provide the student and general user with a more varied 
and flexible tool for analyzing data. 

To this end the purpose of this thesis will be (1) to com- 
pletely describe the six data analysis routines added to the 
APL library, (2) to explain the features and capabilities of 
each of the routines and (3) to demonstrate the use of each 
of the routines with "real world data". 
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The data to be used in this paper has come from two dif- 
ferent sources. The first source of data was from tests per- 
formed jointly by IBM Germany and the German Public Telephone 
Network on errors in transmission of binary data on telephone 
lines (Lewis & Cox, 1966). From this source two sets of data 
are used and each data set contains the times between errors 
in binary bits transmitted over telephone lines. The first 
data set contains 672 elements ( times-between-errors : actual- 
ly number of bits between errors) and will hereby be referred 
to as "telephone data 1". The second data set contains 736 
elements and will be referred to as "telephone data 2". The 
second source of data was obtained from percent overrun or 
underrun on selected military contracts during the year 1950 
(Dixon, 1973). This data set contains 22 elements and will 
be referred to as "cost overrun data". 
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II. HISTOGRAM ROUTINE 



A. DESCRIPTION 

The first routine to be presented is the histogram rou- 
tine which is used for estimating from given data the proba- 
bility density function f(x) of a continuous random vari- 
able. The current APL library has several small histogram 
routines that are general in nature but lack the overal de- 
tail necessary for good data analysis. For this reason HIST 
(histogram routine) was created. HIST rep resents the adap- 
tion and modification of the fortran library version of 
HISTG/F, which was developed at N.P.S. by D. R. Robinson 
under the guidance of Professor P.A.W. Lewis. By modifying 
and adapting HISTG/F to APL the power and efficiency of 
the APL language could be put to full use. 

A complete description of how HIST operates is con- 
tained in the variable HISTHOW. If the users APL work- 
space is properly loaded (see section IX. B. for workspace 
loading procedures) all that is necessary is to type HIST- 
HOW. The user then receives the following printed response 
on the termi nal : 

HISTHOW 

SYNTAX HIST 

HIST ALLOWS YOU TO INTERACTIVELY OBTAIN A HISTOGRAM OF 
YOUR DATA ALONG WITH A SET OF BASIC DESCRIPTIVE STATISTICS . 
IN ADDITION , HIST HAS THE FOLLOWING CAPABILITIES WHICH ALLOW 
YOU : 
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( 1 ) THE OPTION OF A TITLE FOE YOUR HISTOGRAM 

(2) THE OPTION OF .DISPLAYING A SMOOTHED EMPIRICAL DENSITY 
FUNCTION OVER THE HISTOGRAM 

( 3 ) THE OPTION OF SCALING AND SELECTING THE NUMBER OF 
CELLS FOR YOUR HISTOGRAM 

(4) THE OPTION OF SELECTING AN INTERVAL AND PERFORMING A 
HISTOGRAM ON ALL THE DATA POINTS OR CONDITIONALLY 
SELECTING AN INTERVAL IN THE RANGE OF THE DATA. 

(5) THE OPTION OF HAVING YOUR OUTPUT APPEAR ON THE 
OFFLINE PRINTER OR ON YOUR TERMINAL 



WHEN YOU TYPE HIST YOU WILL BE ASKED TO DO THE FOLLOWING : 
(1) ENTER YOUR DATA IN VECTOR FORM - YOU CAN TYPE YOUR DATA 
IN SINGLY OR YOU CAN TYPE THE NAME OF A VARIABLE THAT 
HAS YOUR DATA IN IT. YOU MUST ENSURE THAT YOU HAVE AT 
LEAST 10 DATA POINTS IN YOUR VECTOR AND THAT THERE IS 
SOME DIFFERENCES IN THE DATA POINTS ( MAX SIZE OF INTEGER 
VECTOR IS APPROX. 2500 , MAX SIZE OF REAL VECTOR IS 
2000 ). AFTER YOU HAVE ENTERED YOUR DATA YOU WILL BE 

ASKED 

(2 ) IF YOU DESIRE A SMOOTHED EMPIRICAL DENSITY FUNCTION OR 
NOT. THE EMPIRICAL DENSITY FUNCTION WHEN PLOTTED GIVES 
ESSENTIALLY A MORE EXACT PICTURE OF THE DATA THAN DOES 
THE HISTOGRAM ALONE, ALTHOUGH THIS FEATURE IS SLIGHTLY 
BLURRED BY THE PRECISION WHICH CAN BE OBTAINED WITH THE 
APL BALL ( THE APL FINE PLOT IS NOT PRESENTLY AVAILA- 
BLE ON THE NPS SYSTEM). THE SMOOTHED EMPIRICAL DENSITY 
IS DEFINED BY THE RELATION (LEWIS , LIU , ROBINSON , AND ROS- 
ENBLATT , 197 5 ; ROSENBLATT , 1956 ) 

1 _N 

F(Z) = \ W((X - Z) i B(N )) 

N N x B(N) /_ I 

1 = 1 



WHERE N IS THE NUMBER OF DATA POINTS, B(N) IS A BAND- 
WIDTH FUNCTION, 



B(N) - RANGE i SQRT(N) 

AND W IS A WEIGHT FUNCTION, 

W(Z) = 0 IF | Z | > 1 

= 1 - | Z | OTHERWISE 
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F(Z) IS COMPUTED FOR VALUES OF Z BETWEEN THE MAXIMUM 
AND THE MINIMUM OF THE SAMPLE AND PLOTTED OVER THE 
HISTOGRAM USING THE SYMBOL -F-. THE RELATIVE FREQUENCY 
MARKS ON THE LEFT OF THE OUTPUT REFER TO THE HISTOGRAM , 
AND NOT TO THE DENSITY FUNCTION. AFTER THIS QUERY YOU 
WILL BE ASKED 

(3) IF YOU DESIRE TO TITLE YOUR HISTOGRAM. IF YOU ELECT TO 
TITLE YOUR HISTOGRAM, SIMPLY TYPE YOUR TITLE, ENSURING 
THAT YOUR TITLE IS MORE THAN ONE CHARACTER IN LENGTH. 
IF NO TITLE IS DESIRED JUST HIT THE CARRIAGE RETURN. 
AFTER THE TITLE QUERY YOU WILL BE ASKED 

(4) IF YOU WANT TO SET YOUR OWN SCALE AND THE NUMBER OF 

CELLS. YOUR RESPONSE MUST BE A VECTOR OF 3 ELEMENTS 
THE FIRST ELEMENT IS THE NUMBER OF CELLS YOU DESIRE, 
THIS MUST BE AN INTEGER BETWEEN 10 AND 28 . THE 

SECOND ELEMENT IS THE LEFT SCALE POINT AND THE THIRD 
ELEMENT IS THE RIGHT SCALE POINT ( HIST DOES NOT REQUIRE 
THAT YOUR INTERVAL BE DIVISIBLE BY THE NUMBER OF CELLS). 
IF YOU WANT HIST TO AUTOMATICALLY SCALE AND PICK THE 
CELLS YOU SHOULD TYPE THE VECTOR 000 . AFTER YOU 

HAVE SELECTED YOUR SCALING TECHNIQUE YOU WILL BE ASKED 

(5 ) IF YOU WANT DATA POINTS NOT INSIDE THE SCALE LIMITS 
INCLUDED IN THE HISTOGRAM ROUTINE. MOST HISTOGRAMS LUMP 
DATA POINTS THAT FALL OUTSIDE THE SCALE LIMITS IN THE 
END CELLS. HOWEVER, HIST GIVES YOU THE OPTION OF 
INCLUDING THEM OR EXCLUDING THEM, I.E. OF OBTAINING A 
HISTOGRAM FOR THE CONDITIONAL DENSITY. AFTER YOUR RE- 
SPONSE TO THIS QUERY YOU WILL BE ASKED 

(6 ) IF YOU WANT YOUR OUTPUT TO APPEAR ON THE OFFLINE PRINTER 
OR ON YOUR TERMINAL. IF YOU SELECT THE OFFLINE PRINTER 
THE NEXT RESPONSE YOU WILL RECEIVE ON YOUR TERMINAL IS 
- HISTOGRAM SENT TO PRINTER -. THIS RESPONSE WILL TAKE 
SEVERAL SECONDS AND AFTER IT IS RECEIVED YOUR TERMINAL 
IS FREE FOR FURTHER USE. HOWEVER, IF YOU ELECTED TO 
HAVE YOUR HISTOGRAM PRINTED ON YOUR TERMINAL THE 
PRINTING WOULD BEGIN IN JUST A FEW SECONDS BUT WOULD 
TAKE BETWEEN 5 AND 10 MINUTES TO COMPLETE. 



THE FOLLOWING BASIC DESCRIPTIVE STATISTICS ARE COMPUTED 
AND PRINTED OUT BY HIST. 

MEAN, MEDIAN, TRIMEAN, MIDMEAN, MODE 

GEOMETRIC AND HARMONIC MEANS ( POSITIVE SAMPLES ONLY ) 
VARIANCE, STANDARD DEVIATION, COEFFICIENT OF VARIATION , 
RANGE AND MIDSPREAD 

THIRD AND FOURTH CENTRAL MOMENTS, COEFFICIENTS OF SKEW- 
NESS AND KURTOSIS 

MAXIMUM, MINIMUM AND 5 SAMPLE QUANTILES 
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IN ADDITION , THE MEAN IS DISPLAYED ON THE HISTOGRAM BY A 
VERTICAL COLUMN OF -M- AND THE QUARTILES BY COLUMNS OF 
DOTS . 

INTERPRETING THE OUTPUT 

THE DEFINITIONS OF THE BASIC STATISTICS COMPUTED BY HIST 
ARE LISTED BELOW. PAGE NUMBER REFERENCES ARE TO THE CRC 
STANDARD MATH TABLES , 1 9TH EDITION (1971). 



MEAN 


AVERAGE OF THE SAMPLE (P 554). 


MEDIAN 


MID -VALUE OF THE SAMPLE , IF THERE ARE AN ODD 
NUMBER OF SAMPLE POINTS, OR THE AVERAGE OF THE TWO 
MIDDLE VALUES FOR AN EVEN NUMBER OF POINTS (P 555) 


SAMPLE 

QUARTILES 


THE Q( 1 ) = .25, 3(2) = . 50, AND 3(3) = . 75 POPULATION 
QUARTILES ARE THE SOLUTION TO THE EQUATION 

PROB ( X < = Q(I) 1= 1,2,3 . THE SAMPLE 

QUARTILES, WHICH ESTIMATE THE POPULATION QUARTILES 
ARE, THE JTH ORDERED VALUE IN THE SAMPLE , WHERE 
J = C Q(I)*N ] + 1 . WHERE N = SAMPLE SIZE. 


TRIMEAN 


0.25 x (3(1) + 23(2) + 3(3)), WHERE THE Q(I) ARE 
THE QUARTILES. 


MIDMEAN 


THE AVERAGE OF ALL THE SAMPLE VALUES BETWEEN THE 
UPPER AND LOWER QUARTILES . 


MODE 


THE DATA POINT THAT OCCURS MOST OFTEN ( IF ALL THE 
DATA POINTS ARE DIFFERENT OR IF THERE ARE MORE 
THAN 300 DATA POINTS THE MODE WILL NOT BE PRINTED. 
IF TWO OR MORE MODES OCCUR HIST WILL PRINT THE 
FIRST MODE. ) 


MIDRANGE 


AVERAGE OF THE MAXIMUM AND MINIMUM. 


GEOMETRIC 

MEAN 


(P 554). 


HARMONIC 

MEAN 


(P 555 ) . 


VARIANCE 


(P 557). UNBIASED ESTIMATORS FOR VARIANCE AND 
STANDARD DEVIATION ARE USED. 


STANDARD 

DEVIATION 


(P 557 ) . 
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COEFFICIENT OF VARIATION = STANDARD DEVIATION * \MEAN\ WHEN 





THE MEAN IS LESS THAN IE -30, THE COEFFICIENT OF 
VARIATION IS SET TO ZERO . 


MEAN 

DEVIATION 


(P 556). THE AVERAGE OF THE SUM OF THE ABSOLUTE 
DIFFERENCES BETWEEN THE SAMPLE VALUES AND THE 
MEDIAN . 


RANGE 


MAXIMUM - MINIMUM (P 557). 


MIDSPREAD 


Q(3) - Q( 1) , ALSO CALLED THE INTERQUARTILE 

DISTANCE. 


M 3 


THIRD CENTRAL MOMENT. UNBIASED ESTIMATOR IS USED. 
(P 558 ) 


M 4 


FOURTH CENTRAL MOMENT . UNBIASED ESTIMATOR IS USED. 
(P 553 ) 



COEFFICIENT OF SKEWNESS M 3 * ( STD DEV )* 3 

COEFFICIENT OF KURTOSIS ( M4 t (STD PET) *4 ) - 3 



BETA1 


BIASED ESTIMATE OF THIRD CENTRAL MOMENT. CAN BE 
USED IN TESTING FOR NORMALITY. ( BIOMETRIKA TABLES 
FOR STATISTICIANS , 1966). 


3ETA2 


BIASED ESTIMATE OF FOURTH CENTRAL MOMENT. ( BIOMET- 
RIKA TABLES FOR STATISTICIANS , 1966 ) . 


MAXIMUM 


LARGEST SAMPLE VALUE. 


MINIMUM 


SMALLEST SAMPLE VALUE. 


SAMPLE 

QUANTILES 


THE a -QUANTILE , *( a ) , IS THE SOLUTION TO THE EQ . 
PROBABILITY (X s X(a)) - a . 



With 


this complete description the general user should 



be able to take full advantage of HIST and put to use all 
its options. 
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B. USAGE WITH TELEPHONE DATA 1 AND TELEPHONE 
DATA 2, OFFLINE, ALL DATA, ECDF , AND TITLE 



HIST was now used on two sets of data. Both telephone 
data 1 and telephone data 2 were first used with the offline 
printer demonstrating the title option, the empirical den- 
sity function option and using the conditional option with 
any data points outside the designated interval being lumped 
into the end cells. When HIST was typed the following re- 
sponses to each of the queries were entered. 

HIST 

ENTER DATA IN VECTOR FOR'-! 

n . 

LJ . 

TELDAT1 

IF YOU ALSO WANT A SMOOTHED EMPIRICAL DENSITY FUNCTION RNrpp 
A 1 .IF YOU DO NOT WANT I m ENTER A 0 
C: 

1 

IF YOU WANT TO TITLE YOUR HI STOOP. A'’ TYPE YOU R TITLE . 
IF YOU DO NOT WANT A TITLE JUST HIT THE CARPI ACE RETURN. 

TELEPHONE DATA 1 

IF YOU WANT TO SET THE ’JUMPER OF CELLS AND THE SCALE ENTER 
FIRST THE N UPPER OF CELLS ( AN INTEOER DETWEEN 10 AND 23 ) 
FOLLOWED BY A SPACE AND THEN YOUR LEE ~ SCALE POINT FOLLOWED 
BY A SPACE AND THEN YOUR RIGHT SCALE POINT. HOWEVER , IF YOU 
WANT HIST TO AUTOMATICALLY SCALE ENTER 000 
□ : 

28 0 20000 

GIVEN THAT YOU HAVE SET YOUR OWN SCALE , TO INCLUDE DATA 
POINTS THAT MIGHT BE OUTSIDE YOUR SCALE LIMITS IN THE EN~~ 
CELLS, TYPE 1 . IF YOU DESIGNATED AUTOSCALE ALSO, TYPR 

1 . IF HOWEVER, YOU DO NOT WANT THE DATA OUTSIDE THE SCALE 

LIMITS INCLUDED IN THE HISTOGRAM, TYPE 0 

□ 

1 



IF YOU WANT YOUR OUT vr JT TO A VT> EAP ON TRR OF 
TY^E 1 . IF YOU WANT YOUR OUTPUT TO A 

TERMINAL, TYPE 0 . ( NOTE IF YOU TY n RR 0 

TERMINALS CARRIAGE RAGE SETTING IS ON THE 



77 L INF VQT'"PPP t 
P n EAR ON YOUR 
RR SURE YOUR 
i XI!' 1 ]' 1 ) 



1 

HISTOGRAM SENT TO PRINTER 
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Note that telephone data 1 was contained in the variable 
TELDAT1 and that the number of cells chosen was 28 with the 
left scale point being 0 and the right scale point being 
20 , 000 . 

After the response - HISTOGRAM SENT TO PRINTER - was re- 
ceived. HIST was again typed under identical conditions 
and telephone data 2 was entered through the variable 

TELDAT2 . 

HIST 

ENTER DATA IN VECTOR FORM 
Q: 

TELDAT2 

IF YOU ALSO WANT A SMOOTHED EMPIRICAL DENSITY FUNCTION * NTFR 

A 1 .IF YOU DO NOT WANT IT ENTER A 0 

□ : 

1 

IF YOU WANT TO TITLE YOUR HISTOGRAM TYP C YOUR TTTLF . 

IF YOU DO NOT WANT A TITLE JUS' 7 ' HIT THE CARRIAGE RETURN. 

TELEPHONE DATA 2 

IF YOU WANT TO SET THE NUMBER OF CELLS AND THE SCALE ENTER 
FIRST THE NUMBER OF CELLS (AN INTEGER BETWEEN 10 AND 28 ) 
FOLLOWED BY A SPACE AND THEN YOUR LEFT SCALE POINT FOLLOWED 
BY A SPACE AND THEN YOUR RIGHT SCALE POINT. HOWEVER , IF YOU 
WANT HIST TO AUTOMATICALLY SCALE ENTER 000 
□ : 

23 0 20000 

GIVEN THAT YOU HAVE SET YOUR OWN S r ALE , TO INCLUDE DATA 
POINTS THAT MIGHT BE OUTSIDE YOUR SCALE LIMITS IN THE END 
CELLS , TYPE 1 . IF YOU DESIGNATED AUTOS CAL? ALSO, TYPE 

1 . IF HOWEVER , YOU DO NOT WANT THE DATA OUTSIDE THE S r ALE 

LIMITS INCLUDED IN THE HISTOGRAM, TYPE 0 

□ : 

1 



IF YOU WANT YOUR OUTPUT TO APPEAR O w TH V 

TYPE 1 . IF YOU WANT Y O UR OUTPUT TO 

TERMINAL, TYPE 0 . (NOTE IF YOU TT°FD 

TERMINALS CARRIAGE PAGE SETTING IS ON THF 
n . 



OF FH 7 r D R 1 7 T , 
APPEAR OF YOU p 
0 RP SUR r YOU * 
m A XIMU' f ) 



1 

HISTOGRAM SEtlT TO PRINTER 
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Now by looking at figure 1 (output for telephone data 1) 
and figure 2 (output from telephone data 2) the similarities 
and differences in the histograms can be compared. Without 
getting into specifics, the empirical density function plot 
seems to indicate that both sets of data are similar. How- 
ever, the one ti me-between-errors dominate the data; a more 
detailed discussion of this data and its analysis is given 
in Section VIII. 



17 



lEtffKM CATA 



FIGURE 1 




030 — 

UQUOOUO 

ujoiUJLuujUJa> 

00000 l -»0 

UUUOO r JC 

OOUOO'-"~ 

'jouoouv' 

ou>t->>rr\ji*ir. 



— min- •O' — U 4 



UJ 

0 — 0 



«4 ujaiujujjj 




j 41 ^ 44 :^ 

X 

— J 3 JJJJ 3 

X i JU J JUi 

v» 40 l^ 0 m 0 X 

— — — rsiinr— 

O 



v>i 

«£. W^iJt 44 wUJUi 

lu x mr*» x vj' o 

o — 530 -r> 

x. in «r ^ o ir 

o in 3 jAfsj 

_j >y m — ”■' j ^ m 

<j 

X psj— 

Z 



io— 



UJ(X — *— 



^ O " r ' — 
320030 

JJXUJ'X>XlLU- 

'T® 4 t)j , ao 

» 4 >®l/UU 

— -si' ll'VJU 

no v-— rv*u 
'nin^-ru’U 
< 4 m/' ^riruno 
• ••■•• 
-r O <r— x y 



IXi x > 
uXii 
ZUJ=>C 



< 

UJ 

X 



xi — XZ'O^I 
X X 3 UJ<ZU 
X <Z — OW< — 
i n x •*- 



i — — — <• — u 

oooooou 

, I I I I 1 > I I I 1 I Qj (,^/l.lj 

iro^OO njxT 

— 0 *-»'TO'^ , n 

mOUU'^iu^ 

*»UUV*w' 

>■ nj- tju\— ^ir>»>r 
U» y ' nTI-Mnj f\) — a » 

^ ••••••• 

uj — — r*ioj>r*M^ 

U 



44 

KT- 1 V-.UJ — 



«l 4 <l« 34 i 2 - 




18 



UltFMNl DMA 



FIGURE 2 




19 



ClMKAL lfMJtNCv S * » M t ^ M l- I v 1 tK i. 1 M I .s \[ 



C. USAGE WITH TELEPHONE DATA 1 AND TELEPHONE DATA 2, ON 

LINE; CONDITIONAL DATA BETWEEN 2 AND 140, ECDF, AND TITLE 

Because both sets of data contain: 

(1) a large number of elements, 

(2) a large number of ti mes -between-error equal to 1 
(this becomes more apparent when HISTLIST is 
descr i bed ) , and 

(3) the range of the data sets is so extensive, 

it would appear that the conditional option available on 
HIST could be used to see if the two data sets are in fact 
similar over a smaller interval. This in fact was done us- 
ing the on line printer option, the empirical density func- 
tion option, the title option and the conditional option 
with any data points outside the designated interval excluded 
from the histogram. 



HIST 

Ell TER DATA 111 VECTOR FOR’1 
G: 

TELDAT 1 

IF YOU ALSO WANT A SMOOTHED EMPIRICAL DENSITY FUNCTION ENTER 

A 1 .IF YOU DO NOT WANT IT ENTER A 0 

□ : 

1 

IF YOU WANT TO TITLE YOUR HISTOGRAM TYPE YOUR TITLE. 

IF YOU DO NOT WANT A TITLE JUST HIT THE CARRIAGE RETURN . 

TELEPHONE DATA 1 BETWEEN 2 AND 140 

IF YOU WANT TO SET THE NUMBER OF CELLS AND THE SCALE ENTER 
FIRST THE HUMBER OF CELLS (AN INTEGER BETWEEN 10 AND 28 ) 
FOLLOWED BY A SPACE AND THEN YOUR LEFT SCALE POINT FOLLOWED 
BY A SPACE AND THEN YOUR RIGHT SCALE POINT . HOWEVER, IF YOU 
WANT HIST TO AUTOMATICALLY SCALE ENTER 000 
u : 

28 2 140 
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GIVEN THAT YOU HAVE SET YOUR OWN SCALE , TO INCLUDE DATA 
POINTS THAT MIGHT BE OUTSIDE YOUR SCALE LIMITS IN T HE *ND 
CELLS, TYPE 1 . IF YOU DESIGNATED AUTOS r ALE ALSO , TY^E 

1 . IF HOWEVER, YOU DO NOT WANT THE DATA OUTSIDE THE SCALE 

LIMITS INCLUDED IN THE HISTOGRAM, TYPE 0 

D: 

0 

IF YOU WANT YOUR OUTPUT TO APPEAR ON THE OFFLINE PRINTER , 
TYPE 1 . IF YOU WANT YOUR OUT T >U' 7 ' TO A^EAR ON YOUR 
TERMINAL, TYPE 0 . ( NOTE IF YOU TYPED 0 BE SURE YOUR 

TERMINALS CARRIAGE PAGE SETTING IS ON THE MAXIMUM WIDTH) 

□ : 

0 



Note that the same variable TELDAT1 is used but this 
time the interval was between 2 and 140. Also, the - 
HISTOGRAM SENT TO PRINTER - was not typed because the on- 
line printer (terminal) option was employed. 

After the output for telephone data 1 was printed HIST 
was again typed and telephone data 2 was entered under iden- 
tical conditions. 

HIST 

ENTER DATA IN VECTOR FORM 
n : 

TELDAT 2 

IF YOU ALSO WANT A SMOOTHED EMPIRICAL DENSITY FUNCTION ENT-R 

A 1 .IF YOU DO NOT WANT IT ENTER A 0 

n • 

i 

IF YOU WANT TO TITLE YOUR HISTOGRAM TYPE YOUR TITLE. 

IF YOU DO NOT WANT A TITLE JUST HIT THE CARRIAGE RETURN. 

TELEPHONE DATA 2 BETWEEN 2 AND 140 

IF YOU WANT TO SET THE NUMBER OF CELLS AND THE SCALE ENTER 
FIRST THE NUMBER OF CELLS {AN INTEGER BETWEEN 10 AND 23) 
FOLLOWED BY A SPACE AND THEN YOUR LEFT SCALE POINT FOLLOWED 
BY A SPACE AND THEN YOUR RIGHT SCALE POINT. HOWEVER, IF YOU 
WANT HIST TO AUTOMATICALLY SCALE ENTER 000 
H • 

23 2 140 
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GIVEN THAT YOU HAVE SET YOUR OWN SCALE , TO INCLUDE DATA 
POINTS THAT MIGHT BE OUTSIDE YOUR SCALE LIMITS IN T U E END 
CELLS , TYPE 1 . IF YOU DESIGNATED AUTOSCALE ALSO, TYRE 

1 . IF HOWEVER, YOU DO NOT WANT THE DATA OUTS ZD * T"F SEALE 

LIMITS INCLUDED IN THE HISTOGRAM, TYRE 0 
□ : 

0 

IF YOU WANT YOUR OUTPUT TO APPEAR ON THE OFFLINE RR INTER , 
TYPE 1 . IF YOU WANT YOUR OUTPUT TO APPEAR OR ' YOUR 
TERMINAL, TYPE 0 . ( NOTE IF YOU TYPED 0 BE SURE YOUR 

TERMINALS CARRIAGE PAGE SETTING IS ON THE MAXIMUM WIDTH) 

□ : 

0 



Figure 3 (output from telephone data 1 between 2 and 
140) and figure 4 (output from telephone data 2 between 2 
and 140) now appear quite different in shape based on the 
empirical density function plot. This is, again, because 
of the extensive range of the data (85,993 for telephone 
data 1 and 67,271 for telephone data 2) and the large number 
of times-be tween-error equal to one. Both sets of data are 
actually discrete, only occurring at multiples of 1, but as 
an initial analysis the data sets were treated as continuous. 
Thus, by employing the conditional option available on HIST 
differences in the- two sets of data become quite apparent 
whereas before, the differences were not so easily detected. 
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III. LISTING ROUTINE 



A. DESCRIPTION 

The second routine presented is a listing routine. APL 
has a function that will automatically sort the data and 
print the results. However, the unique feature of HISTLIST 
(listing routine) is that it takes advantage of like occur- 
rences in the data and prints the ordered data ascendingly 
in a compressed form. This becomes highly useful when list- 
ing a large number of data points that contain multiple oc- 
currences. It is also a tool for finding multiplicities in 
supposedly continuous data, and a probability function esti- 
mating routine for data which is known to be discrete. 

A complete description of how HISTLIST operates is 
contained in the variable HISTLISTHOW. When the user types 
HISTLISTHOW the following response is printed on the 
termi nal : 

HISTLISTHOW 

SYNTAX HISTLIST 

HISTLIST IS A HIGHLY CONVENIENT WAY TO LIST YOUR DATA. 
HISTLIST TAKES YOUR DATA, ORDERS IT AND COMPRESSES IT. FOR 
EXAMPLE, IF THREE DATA POINTS WERE ALL THE SAME VALUE 
HISTLIST WOULD JUST PRINT THE VALUE ONCE AND THEN PRINT THE 
NUMBER OF OCCURENCES OF THAT VALUE. HISTLIST WILL ALSO 
PRINT THE SERIAL NUMBER OF THE DATA, THE PERCENTAGE THIS 
SAMPLE VALUE IS TO THE WHOLE SAMPLE, AND A SMALL HISTOGRAM 
(STARS) SHOWING RELATIVE PERCENTAGES . EXAMPLE : 64434 

HISTLIST 
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. NUM. 


ORDERED DATA 


NUMBER OF 


OCCURENCES PER 


CENT 


1 


3 


1 


★ * * * 


.20 


2 


4 


3 


*★★★★★★★**★* 


.60 


5 


6 


1 


★ ★ * ★ 


.20 



HISTLIST IS IDEALLY SUITED FOR A LARGE SAMPLE THAT COULD 
POSSIBLY HAVE A LOT OF LIKE OCCURENCES . HISTLIST FURTHER HAS 
THE ADVANTAGE OF BEING USED WITH EITHER THE OFFLINE PRINTER 
OR THE USERS TERMINAL. 



B. USAGE WITH TELEPHONE DATA 1 AND TELEPHONE DATA 2 OFFLINE 
HISTLIST was used with the title option and offline 
printer option on both telephone data 1 and telephone data 2. 
When HISTLIST was typed the following responses to each of 
the queries were entered. 

HISTLIST . 

HISTLIST PRINTS THE SERIAL NUMBER OF THE COMPRESSED 
DATA, THE ORDERED DATA COMPRESSED , AND T”E NUMBER OF 
LIKE OCCURENCES . ENTER YOUR DATA IN VECTOR FORM . 

□ : 

TELDAT 1 

IF YOU WANT TO TITLE YOUR DATA TY V E YOUR TITLE. 

IF YOU DO NOT WANT A TITLE JUST HIT THE CARR I AO* 

RETURN . 

TELEPHONE DATA 1 

IF YOU WANT YOUR OUTPUT TO APPEAR ON THE OFFLINE 
PRINTER TYPE 1 . IF YOU WANT YOUR OUTPUT TO APPEAR 

ON YOUR TERMINAL TYPE 0 

□ : 

1 

HISTLIST SENT TO PRINTER 



After the response - HISTLIST SENT TO PRINTER - was re- 
ceived HISTLIST was again typed and telephone data 2 was 
entered . 
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HISTLIST 

HIST LIST PRINTS THE SERIAL NUMBER OF THE COMPRESSED 
DATA, THE ORDERED DATA COMPRESSED , AND THE NUMBER OF 
LIKE OCCURENCES . ENTER YOUR DATA IN VECTOP FORM. 
□ : 

TELDAT2 

IF YOU WANT TO TITLE YOUR DATA TY V E YOUR TITLE. 
IF YOU DO NOT WANT A TITLE JUST HIT THE CAR* I ACE 
RETURN . 



TELEPHONE DATA 2 

IF YOU WANT YOUR OUTPUT TO APPEAR ON THE OFFLINE 
PRINTER TYPE 1 . IF YOU WANT YOUR OUTPUT TO APPEAR 

ON YOUR TERMINAL TYPE 0 
□ : 

1 

HISTLIST SENT TO PRINTER 



Looking at figure 5 (output with telephone data 1) and 
figure 6 (output with telephone data 2) the listings of the 
two data sets can be compared. It can be seen that both 
telephone data 1 and telephone data 2 contain a large number 
of multiple occurrences of the number one and the number two. 
In fact 19% of telephone data 1 is the number one and 24% 
of telephone data 2 is the number one. Also, telephone data 
2 has many more multiple occurrences in the 120 to 130 range 
than telephone data 1. This was quickly apparent when one 
looked at the stars to the right of the ordered data. 
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FIGURE 5A 



-f eLEn - C N g C*T <1 l 



icRIAL NUf'SgS 



ORDERED C A T A 



•NUMBER OF OCCURENCES 



1. COOOOO 

2 . C CCu CO 
3 *000000 

-4 COUQ OtJ - 



5 • CU 00 00 
6 « CCOoOO 
7 # C C CO 00 
-M 00000 - - 
9 • CCOOOO 
10* CCOOOO 
11. CCOOO O 
C COO 00 — 

13 . COOOOU 

14. CCOOOO 
15 • CCOoOO 
16-.OCGOOO — 
1 7 • CGut 00 
13. CCOOOO 
19. CCOOOO 

-2-0-r v -- 

21 . CCOOOO 

22. CCOOOO 
23 • C COoO'j 

24. CCCOOO - 

25. COCOO 0 

26. CCOOOO 
27 . CCCOOO 

-23- yO O v OOO 

29. CCCOOO 

ac.cccjoo 

11. COOOOO 
32. CCOOOO 

23. CCCOOO 
34. CCOOOO 
3 5 • CCCCUU 

-2-KHr€*C4Hh) 

38. CCOOOO 

39. CCOOOO 
40 . CCOOOO 
41.O0CCOO 
A3. CCOOOO 
A 4 • C C C C 0 J 
A5. CCCOOO 

A7. CCCOOO 
A8. CCCOOO 

49. CCOOOO 

50. CCOOOO 

51. CCCOOO 

52 . C COOG'v 

53. CCOOOO 

-54rr^C000o 

55.CCCOOU 
56. i-CGoOu 

58. CCCOOO 

59. CCCOOO 
62. CCOOOO 
6 3 • CoGG GO 
6 4 . COCO GO 

-6-5-4- C- C-GO 0 0 



128 
54 
28 
— M- 
17 
11 
10 

—12“ 

14 

9 

10 

— H- 









PER CENT 

0. 190 
0.0 60 
0.0 A2 
- 0*0 33 



66. CCCuOO 
66 • CCCuOO 

69. COCOOO 

70. CCCoOU 
13. CCOoOO 

74. eCCUOO 

75. CCCUOO 
— 7 4- *-irCG6Hjt>— 

63. CCCoOO 

64. CCCCOO 
8 6 • C UOoOO 
63.COOOOO 
69. CCCOOO 
90. CCCOOO 
91. 0 COoQU 
93. CCCCOO 

95. CCCOOO 

96. CCOOOO 
99 • CCCUOO 

1C6. CCOOOo- 
1C9.CC Go GO 

111. CCCOOO 

112. CCOOOO 
“ M ~ 3 ~. Cl CUU t r - 

l 16. CCOOOO 
L 1 7. SOCuti'J 
1 1 9 • C CCUOO 



6 
6 
6 

8 

3 

5 

12 

t- 

5 

5 

3 

— 7 
3 

3 
2 



5 

6 

4 
4 
2 
4 

3 

2~ 

2 

1 

2 

2 

1 

1 

4 



— 3- 

1 

2 

1 

1 

2 

1 

1 

-3- 

2 

2 

2 

1 

2 

3 

1 



0.025 

0.016 

0.015 

- 0 . 010 - 
0.021 
0.013 
0.015 
~ 0*016 
0.0Q9 
0.0Q9 
0.009 
- 0 . 012 - 
0.012 
0.007 
0.018 
— 0 » oO l - 
0.0C7 
0.007 
0. 004 
0.010- 
0.004 
0.004 

0.003 

— 

0.007 
0.009 
0.006 
0 • 0 06 
0.003 
0.006 
0.004 
“0-.-000- 
0.003 
0 . 0 C 1 
0.003 
0 . 003 - 
0.001 
0.001 
0.006 
— j ~ . uO V - 
0.001 
0.003 
0.001 
0 . 0 C 1 
0.003 
0 • 0 0 1 
0.001 
0 • 0 04- 



- t - 

1 

3 

1 

1 

1 

2 

1 

— 3 - 
1 
1 
1 
2 
1 
2 
1 
1 
1 
1 
1 
1 
1 
1 
2 

-ir- 

1 

l 

1 



0.003 
0.003 
0.003 
0 . 001 - 
0.0C3 
0 • 0 04 
0.001 

tfvxrO - 3 — 

0.001 
0 • 0 C 4 
0.001 
— 0 . 001 — 
0.0C1 
0.003 
0.001 

cr s t jv A - 

0.UC1 
0.0C1 
0.JC1 
-0.003 
0.001 
0.C03 
0.0C1 
0.001 
0.0C1 
0.001 
0.001 
-xivoei- 
0.001 
u. JU1 
J.0 03 

"0 a 'U C 1 

0.0G1 

0.001 

0.001 
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FIGURE 5B 
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526 

~93*— 
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534 

535 

536 

537 

538 
535 
•5 4e~ 
541 
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544 

545 

546 

547 
54-0 
5 4 9 
5 5U 
551 
553 
5 54 
555 
558 
5 3 5- 

560 

561 
563 
56 7 - 
566 
569 

5 70 
5-^Jr— 



i 2 c*c-ooooa 

121. COOOOO 

122 • CCCOOO 
123 • COOOOO 

~ hrVrtrCi)* 05 ^ 

123 • CCOUOO 
125. CCCOOO 
142. COOOOO 
148. CCOOOO 
153. COOOOO 
156.CC0OOG 
15 8. CL-COOO 

- trt. COCUUO - 
165.CCCOOO 

175. CCCOOO 

1 76 . C COoOO 

1 77. *uCOOOO 
183. CCCOOO 
186. CCOOOO 
18 7. C CCuOO 

■*-t*2-rC*uGuth>— 
192 • CCOOOU 
2 C 2 . C C Ob 00 
2 1 7. COCO 00 
224 . CCCOOO - 
2 2 6 • C CCO 00 
228. CCCOOO 
1 2 1 . OCGO 00 
~ 2 ~54 . i O COOO" 



-4 

2 

2 

4 

-3- 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

-t" 

1 

1 

2 

1 

1 

3 
1 

-t- 

1 

2 

4 
1 
1 
1 
1 

-t- 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

-1- 

1 

1 

1 

“t— 



Oi 0C6 

0.003 
0.003 
0 .006 

OtO-O 4- 

0.0C1 

0.001 

0.0C1 

— 0.-0C1 
0.001 
0.001 
0.001 

— ortrcr- 
0.001 
0.001 
0.001 

— tf.OClr* 
0.001 
0.0C1 
0.001 

OvtfCt- 

0.001 
0.001 
0.0C3 
0.001 
0.001 
0.0G4 
0.001 
— ^O C l- 
0.001 
J . 003 
0.0C6 

— o.oor- 
0.001 
0.001 
0.001 

JyO - C I— 

U.uOl 

0.001 

U.0C1 

0 .J 0 1 
0.001 
0.001 
0.001 
— OVCCI — 
0.001 
0.001 
0.001 
a. oo t - 
a. oci 
0. 001 
0.0C1 
— 0. 001 • 
U.001 
0.001 
0.0C1 

— -o.ooi - 
0.001 
0.001 
0.003 

— frruttt- 
0.001 
0.001 
0.001 
— u.aClr 
0.001 
0.001 
0.001 
— o.o or- 

0.0C1 
0. OCI 
0.001 

OvOCl** 

0.001 
0.001 
0.001 
U~. 001 
0.001 
0.001 
0.001 
O' <51 — 



572 

573 

574 

575 

576 

577 
575 
-579“ 
530 

581 

582 
5 83 

584 

585 

586 
-557- 

588 

589 

590 
551 

592 

593 
554 
556- 

5 5 7 
548 

599 

600 

6 U 1 
6<J2 
6 0 3 

-o*j**- 

605 

606 
6 07 
603 

609 

610 
611 
612 
612 

614 

615 
-516- 

617 

613 

619 

-52*- 



227. COOOOO 
2 - 5 » CuCuOU 
240 • C COOOu 
— — 2 41 . C CCuOO* 
244.CC GO QC 
243. CCCOOO 
249. GCOOOG 

zt-V i 09CU 1 4J- 

c 5 2 . 0 C 0 J 00 
270. CCCOOO 
279. CCCOOO 

236. CCOOOO 

257. CCCOOO 
2C3. CCGOOO 
; 12 • CCUUOo 

515vCe CU CO- 

247. CCCOOO 
254.0CUUCO 
3tO. COOOOO 

2 6 4.0000 00 

2 70. CCuCOO 
290. C COO 00 
354.CuC00u 
4-36-t 00 uC 0x5 



621 

622 

623 

624 

625 
6 26 
6 2 7 

-6-2-3 

629 

630 



4tC. CCOOOO 
465. CCCOOO 
473. CCOuOO 

— 4 80. CCCuOU 
451 • CCCCOO 
545. CCOOOO 
600. COGuOO 
6v*5". OC’OuG'O 
621. COOOOO 
711. CCCGGu 
3 17. CCOOOO 

- 63<r. tOO u Ul; 

9 2 7. CCOUOO 

1 124. CCCOOO 
1150. CCOOOO 

-55 55rO 

1289. CCCOOO 
1 253. CCOOOO 
1 20 5 • COOOOO 
-1325. CCOOOO — 
12^8.000000 
1 255 • OCCCOO 
l-rl2. CCCOOO 
1429. CCOOOO 
1 489. CCCOOO 
1453. CCOOOu 
1510. CCOOOO 
-Jr5 1<H CCOOOO— 

1 5 4 7 • «, 0 00 C J 
1622. CCOUOO 
1787. CCOOOO 

2 0 7 2 » OC-Ofcr OO 

2483. CCOOOO 

2 6 C6 • L COGOu 
2562.COOCUO 
-6025. CCOuOO-— 

2 2C1.UCGOOO 

3 5 53. u COUUO 
2685. CCCOOO 



1 

1 

1 

1 

1 

1 

2 

-t- 

1 

l 

1 

i - 
1 
1 
1 

1 

1 

1 

-1 

1 

1 

1 

1 

1 

1 

1 

-b- 



4 15 7 .CCOOOO 
4469. CC UwOu 



0.JC1 
0. JC1 
0.001 
- * - . ■■ 001 - 
0.001 
0. jOI 

з. 001 

0.001 

0.UG1 

и. 00 1 
—y-t OOl — 

0.001 

0.001 
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FIGURE 5C 



63i 

632- 

633 

634 

635 
-636- 



637 

638 

639 
6 40 
6 41 

642 

643 
6 44 



645 
6 4o 
647 
643 

649 

650 
6 5 L 



-652- 
653 
6 5h 

655 

656 

657 
6 56 

5 59 

o t i 
6o2 

6 63 
6 64 
6o5 
6 66 
6o7 

—66-3- 
6o 9 

670 

671 

672 



6 2C3. CCCCGO 

— 7 614. G GOOOO 

8222 .0 COO 00 
90L5.CCG00U 
96 25 • C C COOO 

— 9£ 0 6w 

98L3.CCCJ00 
L0L54. CCCOOO 
10298. CCCu JO 

1045 l • G COOOO — 

LO 9 29. CCOOOC 
1 1 2 cU • lCCO C 3 
L 3 44 7. COCOOU 

— 14 3 c 6 .- 060 0-06 

15 L^5*cCuO 0‘J 
15 2 54# CCCO 00 
15 :04. t C OuOO 

15647.000000 

15 66 3* OCOo 0 J 
16 2 c 0* uCcu JO 

16299.000000 

- l b t 6lrr rrU-GO O O - 

16 4C5. CCOO 00 
16 317 • COOuO J 
17174. CCCOOO 
17667. CoGOOO 
18213. COOO 00 
18649. CCCOOO 
19461. CCCOOO 

- 2 1 &43 .-G-OOOO O * 1,11 

23493. CoOJOO 
24692 • OCOOOO 

26443. 0 COu 00 
20974. CCCOOO 
35644. CCCOOO 
330C3. CGOuOO 
40121. J COO 00 

_ 4? lcO.-COOGOO 

47592. CCOOOJ 
617 1C.CC0O00 
69 7 75. CCCOOO 
85993. CCCOOO 



I 



l 

1 

1 

1 

L 

L 

1 

L — 

1 

1 

l 

1 

1 

1 

i 

l 

I 

L 

1 

1 

l 

1 

l 

L 

L 

1 

1 

1 

L 

1 

l 

l 

1 

L 

L 

1 

l 

L 

L 

L 



0.001 

0.0 0-1 — 

O.OOL 

0.001 

0.001 

0 * 00 - 1 — 

0.001 

O.OOL 

0.001 

0 . 001 — 

0.001 

O.OOL 

0.001 

0»001 — 

0 . JC 1 
O.OC1 
0.001 
0.00 1 — 
0. JOL 
0.001 
0 . 0 C 1 

O.O-Ol — 

O.OOL 
O.OOL 
0.001 
0 . 001 — 
0 . J 01 
0.001 
0.001 

OvOtll — 

0.001 
O.OOL 
0.001 
0.001 ■"* 
0.001 
0.0 01 
0 • 0 01 

u-.xh2i — 

O.OOL 

O.OCl 

O.OOL 

O.OOL 
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FIGURE 6A 



3crlI.iL NU v 6 c S CSJcFfcC C£Ti NUM3ER OF CCCUSS'iC ES Pi? OE'-T 



1 


1 • CO 00 00 


L 7 3 S'** ** * * # * .* * -* 


7.2^2 


179 


2* CCOOOO 


36 


J • 0 49 


215 


3 • OC 2000 


11 * 


0.715 


226 


4 • CO 0 jOO 


6 


J . 7 J 8 


252 


5 • COCOOO 


6 


0.0 ; 8 


256 


6 • CCOOOO 


5 


0 • u j7 


243 


7. CO 00 JO 


5 


7. J )7 


248 


8. CCUOOO 


4 


0. 7)5 


252 


9, CCCOOO 


4 


0.0O5 


2 5c 


10 • CO 0000 


9 * 


7.012 


265 


11* C COmOO 


2 


•7.003 


2o7 


12.CO0OOO 


3 


0. 7 74 


270 


13. COJuOO 


1 


0.0)1 


271 


14.CCG00J 


1 


0.001 


272 


15.000000 


l 


0.0 71 


273 


16. C GOO Jo 


4 


7 • j )5 


277 


15.CCCC00 


1 


0.001 


278 


2 !• cCOO 00 


1 


J.071 


279 


22. Cl 0000 


1 


O.o 71 


2 60 


24. CCOOO J 


3 


74 


283 


2 5 • C 0 JO 0 0 


3 


). 704 


2 66 


26. CCOOOO 


-> 


0. i j3 


2 6 5 


27. 2 COO 00 


t 


>. 7)1 


2 89 


30. C CoO 0 J 


3 


7.0 >4 


292 


22.i>CGU0J 


1 


0. J 71 


293 


36.CCG00J 


1 


0. 701 


294 


sj • C C 0 )*) j 


2 


0.40 3 


296 


4 2. COOOO J 


l 


7. 701 


297 


4 4. C COO 00 


1 


>.OJl 


293 


4 7. Cl CO Ou 


4 


7.7)5 


302 


48 . COCOOO 


1 


7* J 01 


3C5 


45 • C l‘0000 


9 * 


7*012 


312 


50 . L CCOOO 


4 


0. J 75 


316 


53. CCCOOJ 


3 


0. 7 74 


319 


5 5 . G C Ov 0 0 


2 


7.0 *3 


321 


5o • CCO JOO 


4 


0.7 ; 5 


325 


57 • 0 COO 00 


3 


7.7 74 


323 


53. COOOOO 


1 


0.0 71 


329 


59. CCUCOJ 


"3 


)* 704 


352 


60 • Cl* 0000 


5 


Uj 03 


334 


62. CCOlOJ 


3 


0. )04 


337 


63. C COO 00 


1 


J • 0 0 1 


338 


64. CCOGOJ 


4 


) • J 75 


242 


6 5.0C 00 JO 


3 


J.^74 


245 


66 • L 000 00 


4 


0*0 ->5 


3 49 


6 7. CU 0000 


1 


0.0 >1 


550 


68 . C Otic 00 


5 


7.077 


355 


o9. JCOuOO 


3 


o. ; >4 


3 5 5 


7C.LCC000 


3 


7. ) 74 


3 61 


71. CCOOOO 


5 


7.007 


3 oo 


7 3 • c, 0 l J o<7 


1 


>. i-Jl 


367 


74. c CCOOO 


1 


J. J)l 


3 C 3 


7 5 • OC JOO J 
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o.ooi 


369 
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3 70 
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o » • 1 


375 


's t . •. 1 00 JO 


1 


O.J )L 
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» . J H 


379 


59 • CuOvO^j 


3 


) . i j4 


382 
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3 69 
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J.O 'L 
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12 5 . li COOO 
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^53 
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FIGURE 6B 
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In addition, HISTLIST saved on printing time and paper. 

By printing the data in compressed form HISTLIST saved 
printing 448 lines (6 additional pages) in the case of tele- 
phone data 1 and 419 lines (5 additional pages) in the case 
of telephone data 2. Thus, HISTLIST not only gives the 
user more information than an ordered listing of the data, 
but also is cost effective in terms of printing time and 
paper used. Finally, note that it is not possible to look 
at the data in as much detail with routine HIST as with 
HISTLIST. If the data is continuous and there are no multi- 
plicities, then HISTLIST gives only this information and 
an ordered listing of the data. The shape of the density 
function can best be seen (estimated) in using routine HIST. 
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IV. SECTIONING ROUTINE 



A. DESCRIPTION 

The third routine presented is the sectioning routine, 
HISTS. HISTS (sectioning routine) gives a way of assessing 
the variability of estimates of descriptive statistics from 
sample data. It is essential that the data be in random 
order . 

The basic idea is as follows: Assume we have m inde- 

pendent observations y -j ,y 2 > • • • >y m of a random variable Y. 

The usual estimate of its mean value p = E(Y) is the sample 

m 

mean y , where y = Z y./m . Now y is the least-squares 

i = 1 1 

estimate of p , and therefore unbiased with variance 

- ? 2 2 
var(y) = a /m , where a = var(y) . Of course a is un- 
known, but we can estimate it from the data with the sample 
variance 

m 

52 ■ rr E (y, - y ) 2 

i = 1 

and then estimate the variance of the estimate y of p as 

var(y) = — = ffri L T y E (*, - y) 

1 = 1 

This is the basis for the sectioning routine: here the 
y. are estimates of descriptive statistics from the m sec- 
tions of the data and y is the average of the statistics 
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from each section. Estimates are assumed independent because 
the original data is assumed to be independent. 

A complete description of how HISTS operates is con- 
tained in the variable HISTSHOW. When the user types HIST- 
SHOW the following response is printed on the terminal: 

HISTSHOW 



SYNTAX HISTS 

HISTS ALLOWS YOU TO INTERACTIVELY SECTION YOUR DATA AND 
ASSESS THE VARIABILITY IN EACH OF THE DESCRIPTIVE STATISTICS 
BY USING THE SECTIONED SAMPLE DATA . 

WHEN YOU TYPE HISTS YOU WILL BE ASKED TO DESIGNATE THE 
NUMBER OF SECTIONS YOU DESIRE. HISTS WILL THEN TAKE 
THE UNORDERED DATA AND DIVIDE THE DATA INTO THE NUMBER 
OF SECTIONS YOU INDICATE DISCARDING ANY DATA POINTS LEFT 
OVER. FOR EXAMPLE , IF YOU HAVE 301 DATA POINTS AND YOU 
SELECT 10 SECTIONS HISTS WILL PLACE THE FIRST 30 DATA POINTS 
IN THE FIRST SECTION , THE SECOND 30 DATA POINTS IN THE 
SECOND SECTION AND SO ON UNTIL THE LAST DATA POINT IS 
OMITTED. YOU WILL NOW HAVE 10 SECTIONS WITH 30 DATA POINTS 
PER SECTION. 

HISTS WOULD NOW PRINT THE FOLLOWING STATISTICS ON EACH OF 
THE SECTIONS'. MEAN , MEDIAN , VARIANCE, STD DEV , COEF VAR, 
SKEWNESS, KURTOSIS, MINIMUM AND MAXIMUM. IN ADDITION, THE 
ABOVE STATISTICS WOULD BE PRINTED FOR THE UNSECTIONED DATA 
TO ALLOW FOR COMPARISONS . 

FINALLY, HISTS WILL PRINT ( 1 ) THE MEAN OF THE SECTIONED 
DATA STATISTICS . FOR EXAMPLE, THE MEAN FOR SKEWNESS WOULD BE 
EACH SECTION VALUE FOR SKEWNESS SUMMED UP AND DIVIDED BY THE 
NUMBER OF SECTIONS. ( 2 ) THE VARIANCE AND STD DEV OF THE 
SECTIONED DATA STATISTICS . AND, ( 3 ) THE STD DEV DIVIDED BY 
THE SQUARE ROOT OF THE NUMBER OF SECTIONS, WHICH ESTIMATES 
THE STANDARD DEVIATION OF THE STATISTICS . 

AS A RESULT, HISTS WILL GIVE YOU AN UNBIASED ESTIMATE OF 
THE VARIANCE OF THE SAMPLE MEAN, MEDIAN, VARIANCE, STD DEV, 
COEF VAR, SKEWNESS AND KURTOSIS FROM USING THE SAMPLE 
VARIANCE OF THE SECTIONED DATA. WITH THIS RESULT, CONFIDENCE 
INTERVALS CAN ALSO BE OBTAINED FOR EACH OF THE ABOVE STATIS- 
TICS, IF THE ESTIMATES FROM THE SECTIONS ARE NORMALLY DIS- 
TRIBUTED. HISTS IS BEST SUITED FOR LARGE AND MODERATE SIZED 
SAMPLES', FOR SMALL SAMPLES JACKNIFING SHOULD BE CONSIDERED . 
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B. USAGE WITH TELEPHONE DATA 1 



HISTS was now used on telephone data 1 to assess the 
variability in the mean, median, variance, standard devia- 
tion, coefficient of variation, skewness and kurtosis. When 
HISTS was typed the following responses were entered (see 
f i gu re 7 ) . 

The 672 data points of telephone data 1 were broken down 
into 16 sections with 42 data points per section. Because 
of this breakdown no data points were discarded. 

The unsectioned statistics printed can be compared with 
the values printed by HIST (figure 1) and are in fact the 
same. Providing that the estimates are normally distributed 
(this can be checked with the normal plots, described later), 
confidence intervals for each of the statistics (mean, median, 
variance, standard deviation, coefficient of variation, skew- 
ness and kurtosis) based on the t-statistic can be obtained 
in the following manner 

- S J„ 

y n - ‘(l-^aMm-l) 

Here y is the mean of the sectioned data statistics (ob- 
n 

tained from column one under summary for sectioned data); 

s- 

^n is the standard deviation of the sectioned data statis- 

/m 

tic divided by the square root of the number of sections 
(obtained from column four under summary for sectioned data); 
m is the number sections chosen; and, ( m _i) 1 s the 

l-%a quantile of the t-d i s tri bu ti on with m-1 degrees of free- 
dom . 
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FIGURE 7 
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VARIANCE 4 . 8637£07 2 . 6217£15 5 . 1203£07 1 . 20 O 1 £O 7 

STD DEV 6 . 06 6 *4 EO 3 1 . 2625£07 3 . 5532£03 8 . 0830£02 

COEF VAR 4 . 117 5£0 0 1 . 5503£00 1 . 2451£00 3 . 1128£-01 

SKEWNESS 4 . 9343 £ 00 1 . 470 l £00 1 . 2125 P 00 3 . 0312£-01 

KURTOSIS 2 . 4 5 52 £01 1 . 24 84 £02 1 . 1173£01 2 . 7933£00 



C. INTERPRETATION. OF RESULTS 

As an example, a confidence interval for the coefficient 
of variation was obtained in the following manner. The mean 
value of the coefficient of variation for the 16 sections is 
4.1175 (column 1). The standard deviation divided by the 
square root of 16 is .31128 (column 4). Using a = .05 , 
the t value with 15 degrees of freedom is 2.131. Thus, the 
95% confidence interval for the coefficient of variation for 
telephone data 1 is 4.1175 +_ ( . 3 1 1 28 ) ( 2 . 1 3 1 ) which is 
[3.454, 4.781]. Confidence intervals on the six other sta- 
tistics could be obtained in the same fashion . 

Again note that the use of the variance estimate from 
the sectioned data to give confidence intervals is based on 
the assumption that the estimates from the sections are in- 
dependent and normally distributed. The normality will de- 
pend on the number of observations in each section, which 
should be kept large to induce normality. This requirement 
conflicts with the need to make the number of sections large 
to reduce the variability in the estimate of the variance of 
the statistics. 

Another problem is that if the number of observations in 
each section is small, the estimates may be severely biased. 
This effect can be seen in figure 7: note that all of the 16 
esti mates of skewness from the sections are smaller than the 
estimate 7.1531 from the unsectioned data. 
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V. JACKNIFE ROUTINE 



A. DESCRIPTION 

The fourth routine presented is the jacknife routine. 
HISTJACK (jacknife routine) is another way of assessing the 
variability in the estimates from sample data, and also of 
reducing bias in estimates of the descriptive statistics. 

The jacknife procedure, like the previous sectioning 
method, is based on the assumption that an independent and 
identically distributed random sample x-j,X 2 »...>x n have 
come from a population with an unknown distribution function 
F^(x) . If we divide the sample into r groups, with each 
group containing the same number of elements, we can obtain 
estimates 9 of the descriptive statistics, which we denote 
generically as 0 , in the same manner as previously done 
with the sectioning method. The difference here is that the 
descriptive statistics are computed with the j L group de- 
leted j=l,2,...,r . We then let 9(j) be the result or 

t h 

the descriptive statistic estimate computed with the j L sub- 
group omitted, and 0^-j is the correspond!' ng result or de- 
scriptive statistic estimated from the entire sample (no 
group omitted). The jacknife pseudo-values are then computed 
in the following way: 



: *0 = (r)(5 all ) - (r - 1)(5 (j) ) 



j “ 1 > 2 j • • • > r 



41 



Then we define the jacknifed estimator to be: 



r 

£ 

j = i 



The pseudo-values can be used to obtain variance estimates 

for 6* , and to set approximate confidence limits, using 

Student's t. The idea is that the pseudo- val ues will be ap 

proximately independent and possibly normally distributed. 

The jacknifed estimator 0* is a sample average so we form 

2 

an estimate s* of its variance given by the following re 
lationship (Miller, 1974): 



- r 

r-1 



2 2 . 
s* = s /r 



This procedure is particularly useful if the number n of 
data points is small, but it must be used with care. Note, 
that the estimator 0* is designed to eliminate a 1/n 
bias term in the estimator 0 . 

A complete description of how HISTJACK operates is con 
tained in the variable HISTJACKHOW. When the user types 
HISTJACKHOW the following response is printed on the ter- 
minal. 
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HISTJACKHOW 



SYNTAX HIST JACK 

HISTJACK ALLOWS YOU TO INTERACTIVELY JACKNIFE YOUR DATA 
AND ASSESS THE VARIABILITY IN EACH OF THE STATISTICAL 
ESTIMATES BY USING THE SAMPLE DATA. 

WHEN YOU TYPE HISTJACK YOU WILL BE ASKED TO DESIGNATE THE 
NUMBER OF GROUPS YOU DESIRE. HISTJACK WILL .TAKE THE 
UNORDERED DATA AND DIVIDE THE DATA INTO THE NUMBER OF 
GROUPS YOU INDICATE DISCARDING ANY DATA POINTS LEFT OVER. 
FOR EXAMPLE , IF YOU HAVE 22 DATA POINTS AND YOU SELECT 7 
GROUPS HISTJACK WILL PLACE THE FIRST 3 DATA POINTS IN GROUP 
1 , THE SECOND 3 DATA POINTS IN GROUP 2, AND SO ON UNTIL THE 
LAST DATA POINT IS OMITTED. YOU WOULD NOW HAVE 7 GROUPS 
WITH 3 DATA POINTS PER GROUP. IF YOU HAD ELECTED TO DO A 
COMPLETE JACKNIFE , THAT IS TYPED 22 , YOU WOULD NOW HAVE 22 
GROUPS WITH 1 DATA POINT OMITTED PER GROUP. 

HISTJACK WOULD NOW PERFORM STATISTICAL COMPUTATIONS USING 
THE JACKNIFE PROCEDURE. THAT IS, BY OMITTING ONE GROUP AT A 
TIME, STARTING WITH THE FIRST GROUP , HISTJACK WOULD PRINT 
THE FOLLOWING STATISTICS : MEAN, MEDIAN, VARIANCE, STD DEV, 
COEF VAR, SKEWNESS , KURTOSIS , MINIMUM AND MAXIMUM. IN 
ADDITION , THE ABOVE STATISTICS WOULD BE PRINTED FOR THE 
UNGROUPED DATA TO ALLOW FOR COMPARISONS . {NOTE, THE COLUMNS 
GIVE THE STATISTIC ESTIMATED FROM ALL THE DATA WITH ONE 
GROUP MISSING, AND NOT THE PSEUDO-VALUES) 

FINALLY, HISTJACK WILL PRINT ( 1 ) THE JACKNIFE ESTIMATE 
( 2 ) THE SAMPLE VARIANCE OF THE PSEUDO - VALUES DERIVED IN THE 
JACKNIFE ESTIMATE ( 3 ) AND, THE ESTIMATED STD DEV OF THE 
JACKNIFE ESTIMATE DIVIDED BY THE SQUARE ROOT OF THE NUMBER 
OF GROUPS. 

AS A RESULT, HISTJACK WILL GIVE YOU AN ESTIMATE OF THE 
VARIANCE OF THE SAMPLE MEAN, MEDIAN, VARIANCE, STD DEV, COEF 
VAR, SKEWNESS AND KURTOSIS USING THE SAMPLE VARIANCE OF THE 
JACKNIFED DATA. WITH THIS RESULT, CONFIDENCE INTERVALS CAN 
BE OBTAINED FOR EACH OF THE ABOVE STATISTICS , AGAIN ASSUMING 
THAT THE PSEUDO -VALUES ARE APPROXIMATELY INDEPENDENT AND 
NORMALLY DISTRIBUTED . HISTJACK IS BEST SUITED FOR SMALL 
SAMPLES . 
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B. USAGE WITH TELEPHONE DATA 1 



HISTJACK was now used on telephone data 1 to assess the 
variability in the mean, median, variance, standard devia- 
tion, coefficient of variation, skewness and kurtosis. When 
HISTJACK was typed the following responses were entered. 

(see figure 8) 

The 672 data points were broken down into 16 groups with 
42 data points per group. Again, because of this breakdown 
no data points were discarded. 

The ungrouped statistics printed are again the same 
values that were printed by HIST (figure 1). Using the 
jacknife method, confidence intervals for each of the statis- 
tics (mean, median, variance, standard deviation, coefficient 
of variation, skewness and kurtosis) can be obtained in the 
following manner; 



9* ± ( s *) t (l-J 2 a),(r-l) ‘ 

Here 9* is the jacknife estimate of the sample data (ob- 
tained from column one under summary for jacknifed data); 
s * is the jacknife estimate of the standard deviation 
divided by the square root of the number of groups (obtained 
from column four under summary for jacknifed data); r is 
the number of groups chosen; and, ( r _i) is the 

1 -%a quantile of the t-di s tri buti on with r-1 degrees of free- 
dom. The basis for these assertions about the confidence in- 
tervals using the jacknifing technique is asymptotic and great 
care must be taken in using them. 
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FIGURE 8 
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C. INTERPRETATION OF RESULTS 

To compare the confidence interval obtained for the co- 
efficient of variation using the sectioning routine with 
that obtained using the jacknife routine the following was 
done. The jacknife estimate of the coefficient of variation 
for the 16 groups is -4.5053 (column 1). The jacknife esti- 
mate of the standard deviation divided by the square root of 
16 is .3894 . Using a = .05, the t value with 15 degrees 
of freedom is 2.131. Thus, the 95% confidence interval for 
the coefficient of variation for telephone data 1 is 4.5053 
+_ (.3894) (2.1 31 ) which is [3.676 , 5.335]. This compares 
with the confidence interval of [3.454, 4.781] using the sec- 
tioning routine described in section IV. Likewise, confi- 
dence intervals on the remaining six statistics could be ob- 
tained in a similar manner. Note that the values obtained 
for the skewness coefficient from the sections are now not 
evidently biased; of the 16 values, 7 have values below the 
value 7.1531 for all the data. 

D. USAGE WITH COST OVERRUN DATA 

To demonstrate how the complete jacknife could be used 
and why it is better to use when possible, the following was 
done. The 22 data points of the cost overrun data were used 
with the jacknife routine (HISTJACK). When HISTJACK was 
typed the data was entered in the variable YROVR and 22 was 
typed as the number of groups. By typing 22, which is the 
same as the number of data points, a complete jacknife was 
done. 
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Looking d t the output from the complete jdcknife (figure 
9), the cost overrun ddtd cdn be studied. One cdn note thdt 
by using the complete jdcknife the medn, medidn, dnd Vdridnce 
of the jdcknife estimdte (column one under summdry for jack- 
nifed ddtd) dre the sdme Vdlue ds the ungrouped medn, medidn 
dnd vdridnce. But, dlso note thdt the coefficient of Vdrid- 
tion is less thdn zero which cdn hdppen when using the jdck- 
nife technique. 
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VI. EXPONENTIAL PLOTTING ROUTINE 



A. DESCRIPTION 

The fifth routine presented is an exponential plotting 
routine. Routine EXPONP is a way of plotting the data to 
see if it "fits" an exponential distribution, and also to 
give some indication of what alternative distributions could 
be used if the exponential hypothesis is rejected. 

A complete description of how EXPONP operates is con- 
tained in the variable EXPONPHOW . When the user types 
EXPONPHOW the following response is printed on the terminal. 

EXPONPHOW 



SYNTAX EXPONP 

EXPONP ORDERS THE DATA X{I) AND COMPUTES 
THE EMPIRICAL LOG SURVIVER FUNCTION FOR THE DATA. 
THAT IS, 



\ / I I \ I / I \ 

x vs l I \ I I 1 - 

/ \ (I) I I I \ N + 1 / 



THE ORDERED DATA IS PLOTTED AGAINST THE LOG SUR- 
VIVER FUNCTION TO SEE IF THERE IS A LINEAR FIT. 
EXPONP ALSO ALLOWS YOU TO TITLE YOUR PLOT. 



B. USAGE WITH TELEPHONE DATA 1 

EXPONP was used with telephone data 1 to see if the 
data plotted as a relative straight line. When EXPONP was 
typed the following responses were entered. 
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EXPONP 

EXPO UP ORDERS THE DATA YOU GIVE ADD COMPUTES THE 
EMPIRICAL LOG SURVIVER FUNCTION FOR THE DATA. 
A PLOT OF THE LOG SURVIVER FUNCTION FOR THE DATA 
IS THEN PRINTED TO SEE IF THERE IS A LINEAR FIT. 

IF YOU WANT TO TITLE YOUR PLOT TYPE YOUR TITLE. 
IF YOU DO NOT WANT A TITLE JUST HIT T U E CARRIAGE 

RETURN . 

TELEPHONE DATA 1 

ENTER YOUR DATA IN VECTOR FORM 
□ : 

TELDAT1 



Looking at figure 10 (plot of telephone data 1 using EX- 
PONP ), it was found that the data did not plot linearly from 
the origin, but that the data did appear somewhat linear in 
-the tail (5,000 to 90,000 range). 



C. USAGE WITH RANDOM GENERATED EXPONENTIALLY DISTRIBUTED 

SAMPLE WITH MEAN SAME AS TELEPHONE DATA 1 

As a comparison, EXPONP was used with an exponentially 
generated random sample with the same mean as telephone data 
1 (figure 11). As expected, this plot is, within limits of 
sample fluctuations, linear from the or i gin and in fact, what 
telephone data 1 would have looked like if the data was truly 
exponential. The quantization because of the coarseness of 
the APL type-ball is evident in this plot. The sample size 
is 672 , but not all these points can be plotted separately. 
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EXPONENTIAL SCORES 
0.1075 



FIGURE 10 
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FIGURE 11 
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VII. NORMAL PLOTTING ROUTINE 



A. DESCRIPTION 

The final routine presented is a normal plotting routine. 
Routine NORMP is a way of plotting the data to see if it 
"fits" a normal distribution. In particular one might want 
to look at estimates of descriptive statistics obtained from 
sections and groups in routines HISTS and HISTJACK . 

A complete description of how NORMP operates is con- 
tained in the variable NORMP HOW . When the user types NORMP- 
HOW the following response is printed on the terminal. 



NORMPHOW 



SYNTAX NORMP 

NORMP ORDERS THE DATA X(I) AND COMPUTES THE 
INVERSE OF THE UNIT NORMAL CUMULATIVE DISTRIBU- 
TION. THAT IS, 



\ / T-l / I \ 

X VS $ I 

/ \ (I) x \ N + 1 / 



THE ORDERED DATA IS PLOTTED AGAINST THE INVERSE OF 
THE UNIT NORMAL CUMULATIVE DISTRIBUTION TO SEE 
IF THERE IS A LINEAR FIT. NORMP ALSO ALLOWS YOU 
TO CONVIENTLY TITLE YOUR PLOT. 
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B. USAGE WITH COST OVERRUN DATA 

NORMP was used with the cost overrun data to see if 
the data plotted as a relative straight line. When NORMP 
was typed the following responses were entered. 



NOR. MR 

NOR"? ORDERS THE DATA YOU GIVE AND GO'^U^^S T U E 
INVERSE OF THE UNIT NORMAL CUMULATIVE DISTRIBU- 
TION FOR THE DATA. A PLOT OF THE INVERSE OF THE 
UNIT NORMAL CUMULATIVE DISTRIBUTION VS THE ORDER- 
ED DATA IS THEN PRINTED TO SFE IF THERE IS A 
LINEAR FIT. 

IF YOU WANT TO TITLE YOUR v LOT TYPF YOU P TITLE. 
IF YOU DO NOT WANT A TITLE JUST HIT THE CARRIAGE 
RETURN . 



COST OVERRUNS 

ENTER YOUR DATA IN VECTOR FORM 

□ : 

YROVR 



Note that the cost overrun data was contained in the 
variable YROVR . Looking at figure 12 (plot of cost over- 
run data using NORMP ), it was found that the data did in 
fact plot fairly linear through the range -14 to 26 (for- 
mal tests are available; see Wilk & Gnanades i kan , 1 968). 

C. USAGE WITH NORMAL SAMPLE GENERATED WITH MEAN AND 
VARIANCE THE SAME AS COST OVERRUN DATA 

As a comparison, NORMP was used with a normal sample 
with the same mean and variance as the cost overrun data 
(figure 13). As expected, this plot is very linear. But 
again, this plot is not that much different from that of fig- 
ure 12, which gives credence to the fact that the cost over- 
run data might in fact be normally distributed. 
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COST OVERRUNS 



FIGURE 12 
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ORDERED DATA 



NORMAL SAMPLE GENERATED WITH SAME MEAN AND VARIANCE AS COST OVERRUN DATA 



FIGURE 13 
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OR HER El) PAT A 



D. USAGE WTH COEFFICiENT OF VARIATION DATA OBTAINED 

FROM USING SECTIONING ROUTINE 

In order to check for normality in the sectioned esti- 
mates obtained from using HISTS (sectioning routine) the 
following was done. The 16 coefficient of variation 
values obtained from using HISTS with telephone data 1 
(column 5, figure 7) were entered as a vector into NORMP . 
Figure 14 shows that the plot is marginally linear. This 
demonstrates the need for formal tests to verify normality 
in the absence of a strictly linear plot (Wilk & Gnanadsikan, 
1968) . 
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PLOT OF COE F VAX VALUES US l NO 10 SECTIONS FROM FIGURE 



FIGURE 14 
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ORDERED DATA 



VIII. THE INDEPENDENCE AND MARKOV CHAIN 
HYPOTHESES FOR THE TELEPHONE DATA 



The telephone data used in the thesis (Lewis & Cox, 1966) 
actually consists of binary bits transmitted over telephone 
lines and the information that the bit transmitted at time i, 
i = 0,1,2,... is in error or not. This i nforma ti on i s 
character!' zed by a sequence of bi nary-va 1 ued random variables 
x(i), i = 0,1,... where x ( i ) = 1 means that the bit trans- 
mitted at time i is in error, while x(i)=0 means that the 
bit transmitted at time zero is correctly transmitted. 

In telephone data 1 there are 672 ones and 1,105,476 
zeros, and a much more compact and equivalent representation 
of the data is obtained via the sequence of random variables 
y(j), j = 1 , 2 , . . . where y(j) is one plus the number of cor- 

a- L. q j. 

rectly transmitted bits between the j and (j-1) bit error, 
with the convention that y(j)=l if the errors occur on adja- 
cent transmitted bits, and y(l) is the time from i=0 to the 
first incorrectly transmitted bit. The y(j) are called the 
ti me s-be tween-errors. 

A null hypothesis for the error structure which could be 
examined is that errors occur independently at each bit with 
a fixed probability, i.e. 

P { x ( i ) = 1 } = tt( 1) i = 0 , 1 , . . . 

P{x(i)=0} = 7T ( 0 ) = 1 - ir ( 1 ) i = 0 , 1 , . . . 
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The y ( j ) ' s then are independent and geometrically dis 
tributed, since 

P{y ( j )=! } = P{if (j-l) St error at time i; j th at 

time i+1} 



= tt( 1 ) 



P{y ( j)=2} = P { i f (j-l) St error at time i; j^* 1 at 

time i+2} 

= tt( 1 ) [1 -tt( 1 ) ] = it ( 1 ) tt ( 0) 

P{y( j) = k + l } = P { i f (j-l) st error at time i; j t ^ 1 at 

time i + 1 + k } 

- it(l)[l-it(l)] k - ir(l)[n(0)] k 
Note that, using the geometric series summation formula, 



L P{y(j)=k} = t — 
k= 1 1 ‘ 
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(l-ir(l)) 



E[y(j)] kP{y(j) = k} = TT J rffr = ^ 



Now assume that the Markov structure of the zero's and 
ones is described by the transition matrix 



P(0,0) 


P ( 0, 1 ) 




p + O-pMD 


0-pMo) 


p(l ,0) 


P(l,l) 

i 




(I-p)tt(I) 


p+( I-p)tt(O) 


>(m,n) = 


P { x ( i + 1 ) = 


n 1 


x ( i ) = m } , and 


we have para 



meterized the chain in terms of the stationary probability 
of a one or zero, and a correlation parameter CKp<l . Note 
that there are only two degrees of freedom in the stochastic 
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matrix, since rows must sum to 1, and there is only one de- 
gree of freedom if the stationary probability it ( 0 ) = 1 -tt ( 1 ) 
is fixed. Note that the stationary probabilities in the 2- 
state case are given by 

= 2- P ( 0 * 0 1 - P ( 1 ,1 ) = 2-P j 0 * 0 1 - P ( 1 ,1 ) 

We now define the runs of ones or zeros i.e. for £=0 or 
1= 1 , let 

% 

= i n f { n >_1 : x(i+n) f S,}-1 , 

the length of a run of % ' s, starti ng after time i, where the 
1 ength can be 0,1,2,... . 

For example if x(i+l)=l , then the length of runs of 
zeros starting after time i is zero, the length of runs of 
ones is at least one long. Note that it is possible to talk 
of a conditional runs structure, i.e. the length of a run of 
ones which is given to start after time i . The run length 
is then at least one long. 

Now the probability of a run T ^ having length greater 
than k is, using the Markov property, 

P{T 0 >k}= P { x ( i + 1 )=x(i + 2) = . . .x(i+k)=£} = Tr(il)[P(£,£)] k " 1 

k = 1 , . . . 

and P {T^ =0} = 1 -tt ( l ) . 

Thus, the run lengths are geometrically distributed and 

E[TU)] ■ E PtT„>k} - ■ (l-plf&uU 

N I 
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Note that p=0 gives the independence case, and while 
the runs of ones or zeros are geometrically distributed for 
both the independence or Markov dependent model, the mean 
run length is always longer for the Markov dependence, since 

Tr(H) . 7T [_Z ) . 

(I-pHI-ttU)J - [1-ttU)] °- p<1 

Thus, we could use the distributional properties of the 
runs to (1) check that either hypothesis is tenable or (2) 
if so, compare the estimated run lengths with the mean length 
tt(S.) / [1 -tt (£) ] predicted by the independence assumption. If 
the run lengths are not geometric, than another model must be 
postul ated. 

Note that when this mean ti me-between-errors is large as 
it is for telephone data 1 (figure 1; E[y(j)]= 1,548) the 
discreteness of the time scale can be ignored and the geometric 
distribution is indistinguishable from its continuous time 
analog, the exponential distribution. 

That is approximation of the geometric distribution by 
an exponential distribution is valid can be seen from the 
fact that there are 672 errors (x(i)'s equal to one) in 
1 , 106,148 transmitted bits, so that an estimate of tt ( 1 ) , 
which is the maximum likelihood estimate under the independence 
hypothesis, is 

j _ # x ( i ) 1 s = 1 _ # x ( i ) 1 s = 1 

total # bits transmitted # x ( i ) 1 s= 1 +#x ( i ) ' s=0 



In the present data 



tt(1) = — = .0006075 

1 ,106,148 

Now this geometric hypothesis will be examined, but it 
is clear from figure 1 that the hypothesis is not true. The 
distribution is in fact highly skewed and has been examined 
by Lewis & Cox, 1966. 

An alternative model to independent bit errors is that 
the dependence structure is Markovian. One could examine 
this hypothesis with time-series methods but a method which 
is adaptable for use with the histogram routine and which ex- 
amines both the independence and Markov assumptions is to 
look at runs of ones and zeros in the x(i). Under both hypo- 
thesis these runs have geometrically distributed lengths. 

The alternating conditional runs of ones for. telephone 
data 1 are shown in figure 15 and for runs of zeros are shown 
in figure 16. Also, HISTLIST was used on the conditional 
runs and figure 17 shows the runs of ones and figure 18 shows 
the runs of zero. 

To test the hypothesis that the runs of ones in telephone 
data 1 is geometrically distributed the following was done. 
Using figures 15 and 17 the following data was obtained: 

MEAN = 1.235294 # of runs = 1 = 444 

VARIANCE = .346008 # of runs = 2 = 81 

# of runs = 3 = 15 

# of runs = 4 = 1 

# of runs = 5 = 2 

# of runs > 6 = 1 
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runs or one: for telephone' data 



FIGURE 15 
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CENTRAL TENDENCY SPREAD HIGHER CENTRA! ■ MOMENTS DISTRIBUTION 



HUNS OF ZERO FOR TELEPHONE RATA 



FIGURE 16 
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HISTLIST 

III ST LI ST PRINTS THE SERIAL NUHRER OF THE CON PRESS ED 



FIGURE 17 
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FIGURE 18A 
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FIGURE 18B 



3T8 — 


Tie 


369 


118 


3 S C 


US 


3 S A 


120 


3 S 6 


121 


3 S 8 


122 


AC2 


123 


4 C5 


127 



.cccooo 

.cccooc 

•CCOOOO 

.cccooc 

, CCCOOC 



~a-C6“ 
4C7 
4ce 
4C9 
A X C 
-til 
A 12 
A 13 

415 

A 16 _ 

A 1 7 

418 

4T9 

42C 

A21 

-422- 
4 23 
425 
A 26 
A 2 7 
42C- 
A31 
A 22 

— -» ■*— 
A 35 
4?<= 

A AC 
A A 1 
A A2 
A A 3 
4 4 A 

A A 5“ 

A A 6 
A A 7 
A A 8 
A A c . 
45C 
A 5 l 
A 3 2 
A 52“ 
A 5 A 
A 5 5 
A 5 6 
A 2 7 
456 
A 2 S 
A 6 C 

— At*r 



462 
A 6 3 
AcA 
Ac 5 
460 
468 
A 6 S 

" A'?C 
A 7 1 
4 72 
A 7 2 
A 7 A 
A 7 5 
A 76 
A 7 7 

-A 7 8 — 
A/S 
4 6 C 
A c 1 
A 8 2 

463 
A 6 4 
A c 5 
A 86 
Ac 7 
468 
A c S 
ASC 
AS 1 
AS 2 
AS 2 

4 94 — 
4 S 5 
4 S 6 



«, , . • CCv*C uC 

1 A 1 .ccoooo 
1A 7.CCCC00 
152 .CCCOCO 
155 .CCCOOO 
157.CCCOOO 
16C. CCCOOO 
ItA. CCCOOO 

-rT^rrcccrrtro— 

1 75 . CCCOOO 

176. CCCGCO 
16 2 • CCCCCO 
165 .CCCOOO 
16c. CCCOOC 
IS l .CCCOOO 
192. CCCOOO 

-ten ccccccr- 

216 .CCOOOO 
22J.CCCGCC 
525. CCCCCC 
227 .CCCOOO 
230. CCCOCC 
223.CCCCC0 
236 • CCCOOO 
"2 2SVCCCC CXT" 
239 .CCCOOO 

2 A C . CCCOCC 
243. CCCOOO 
247 .CCOOOO 
2A<3. CCCOOO 
25C. CCCOCO 
251. CCOOOO 

"?69. CCCOCO~ 
27o. CCCOOO 
26 5 .CCCOCO 
296 .CCCOCO 
2C2 .CCCCOO 
211. CCCOOO 
2 1 7.CCCCCC 
346. CCOOOO 

-o rr.ee cc co- 
359. cccooo 
363 . CCCOCC 
3 6 S . CCCOOO 
2 8 S .CCCOCO 
293. CCCOCC 
A34.CC COOC 
459 .CCCOOO 
“ A fA". CCCOCO” 
. 72 .CCCOOO 



47S. CCCCCO 
ASC. CCCCOO 
548 .CCCCOO 
5SS. CCCCOO 
6C3. CCCCOO 
620. COCOCO 
— ree.-ct coco— 

616. CCCOOO 
635 . CCCOCC 

s 2 6 . ccccaa 

1 122 .CCCOOC 
i IAS. CCCOOO 
1269. CCCOCO 
1268. CCCOOO 

-T2<-fr«-eeec~ 

13C4.CCC0G0 
1327. CCCOCC 
1247. CCCOOO 
1 32A .CCCOOO 
1411. CCCCOO 
1428 .C CCCOC 
14cS. CCCOOC 
1 4 S 2 . CCCCCC 
l 5CS .CCCCOO 
15 U. CCCCCC 
1 5 46 . CCCOOO 
1632 . CCCOOO 

1 7 66 . CCCOOO 

2 C 7 i .CCCCOO 
2 46 2 . CCCOOO 

~ 28 C5 • CCCCOT 

2 S 6 i .CCCOOO 

3 C 25 .CCCCCC 



1 

A 

2 

2 

A 

3 

i 

1 

1 

1 

l 

1 

1 

l 

~r- 

i 

i 

i 

i 

i 

i 

1 

~r~ 

2 
i 
i 
3 
i 
1 
1 

”2' 

A 

1 

1 

i 

1 

1 

l 

- r~ 

i 

i. 

i 

i 

l 

i 

i 

r~ 

i 

i 

i 

i 

i 

i 

i 

— r 

i 

x 

i 

1 

2 
l 
1 



1 

1 

1 

1 

1 

-r- 

i 

i 

i 

i 

i 

i 

l 

L 

X 

1 

1 

1 

l 

1 

I 

1 

1 



"07007 * 

C .002 
0.00 7 
C. 004 
C.004 
0.007 
0.006 
0.002 
“07002 
0 . CO 2 
0*00 2 
0.002 
C.002 
0.002 
0.002 
0 . 00 2 

-arurrz 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

-07 00 2 

0.004 
0.002 
0.00 2 
C . 006 
0 .002 
0.002 
0.00 2 

-"07004 

C.C0 7 
0.002 
C.002 
C.002 
0.002 
C.00 2 
0.002 

_ ^^ryi 

0.002 
0.002 
C .002 
0.002 
0.002 
0.002 
0.002 

-07-002 

0.002 
0.002 
0.002 
0 .C 02 
0.002 
0.002 
G.0C2 



0.002 
0 .002 
0.002 
C.002 
0 .004- 
C . 002 
_ 0.002 

a!oo 2 
0.002 
0.002 
0.002 
0.00 2 
C.002 
0.002 

—07072 

0.002 
0.002 
0.002 
C.C02 
0.002 
C.002 
0.00 2 
0.00 2 
0.002 
0.002 
J.00 2 
0 . GO 2 
0.002 
0.002 
0.002 
0.002 
0.002 
C.002 



68 



FIGURE 18C 
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If the runs of ones are geometric then prob{x(i)=k) = 
(l-p)pk ^ k=l,2,.., . Thus, this is the "geometric plus 
one" distribution. 



y = E [x] = i 

(1 - p) 

a 2 = VAR [X] = 

(1 - P) 



C(X) 



VAR[X]^ s h 
E[X] 



To find p set E[X] = 1 .235294 = 1 / ( 1 - p ) 

p = .1904761 

Therefore, if the data is "geometric plus one" then 

EXPECTED VAR [X] = . 1 904761 /(. 80953 29 ) 2 

= .2906572 

Thus, the expected variance is .2906572 and the observed var- 
iance from HIST is .3460080 . Also, the expected coefficient 
of variance is 

EXPECTED C ( X ) = (.1904761)^ = .4364356 

And, the observed coefficient of variation is .4761817 . 

Therefore, at this point there seems to be a fairly close 
agreement between the runs of one and a "geometric plus one" 
distribution with p = .1904761 . 

As further proof a Chi-square test for goodness of fit 
was run on the runs. By using the formula 

prob { X = x } = (l-p)p x_1 for x = 1 , 2 , 3 , 4 , 5 , . . . 
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PROBABILITY 



EXPECTED 



OBSERVED 



P(X=1) = 


.8095239 


440.38 


444 


P ( X=2 ) = 


.1541949 


83.88 


81 


P ( X = 3 ) = 


.0293704 


15.98 1 


1 5 


P ( X=4 ) = 


.0055943 


3.04 


19.74 ] z 


P(X=5) = 


.0010655 


.58 


P ( X> 6 ) = 


.0002510 


.14 


1 



Note, to use Chi-square not more than 20% of the cells 
should have expected frequencies less than 5 and no cell 
should have an expected frequency less than one. Therefore, 
the above frequencies must be combined into 3 cells. 



o 

0 3 ( o b s . - e x . ) 

X 2 *Z 3 

i=l ex^ 



.1562799 



2 

And, x Q 5 2 = ^.99 • Thus, .the null hypothesis that the 
runs of one are "geometric plus one" with p = .1904761 can 
not be rejected. 

A similar procedure was done with the runs of greater 
than one. By using figure 15 the following information can 
be obtained: 

MEAN = 1911.27 
VARIANCE = 59,064,970 
COE F . VAR . = 4.021 082 



And, by using the same method as previously done and solving 
for p one gets p = .9994767 . 

EXPECTED VAR [X] = . 9994767/ (. 0005 233 ) 2 = 3,651 ,213 
This expected variance differs greatly from the observed 
variance. Also, the expected coefficient of variation is 
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computed to be 



EXPECTED C(X) = (.9994767)^ = .9997383 

This compares with the observed coefficient of variation of 
4.021082 . Because of the gross departures of the variance 
and the coefficient of variation in the geometric hypothesis, 
one can conclude that the runs of length greater than 1 are 
not geometrically distributed. 
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IX. DOCUMENTATION ON ROUTINES 



A. LOCATION IN APL LIBRARY 

The descriptions and routines that have been presented 
are all available in the APL workspace library 2 DATALFNS . 
Providing the user is properly logged on the terminal and .in 
the APL mode, all that is necessary is to type )L0AD 2 
DATALFNS . If the user then types DESCRIBE, a short descrip- 
tion of the six routines presented and instructions on how 
to obtain the detailed in form at ion that is available in each 
of the "HOW" variables would be printed. 

B. WORKSPACE LOADING PROCEDURES 

Each of the routines was designed to stand alone. That 
is, if the user desires just to use HIST , all that is neces- 
sary is to type )C0PY 2 DATALFNS HISTGRP into a clear work- 
space. HISTGRP contains the principal routine HIST and 
only the additional routines necessary for HIST to operate. 
Thus, the user does not clutter his workspace with any un- 
needed functions. It is this group structure that maintains 
the orderliness of the workspace. And, the ability to copy 
a particular group into a clear workspace provides more space 
for data and executions of the functions. 

The following is the group structure in library 2 
DATALFNS . 
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PRINCIPAL 

GROUP ROUTINE 



HISTGRP HIST 



HISTLISTGRP HISTLIST 
HISTSGRP HISTS 



OTHER NECESSARY 

ROUTINES VARIABLES 

APLNAME,AP LOT, AUTOS, 
CMS,DFT,ECDF,ECODE, 

EFT, OF, OUT, WRITE 

APLNAME,CMS,ECODE, 

DFT, OF, OUT, WRITE 

DFT , EFT 



HISTJACKGRP HISTJACK DFT, EFT, TOT 



EXPONPGRP . EXPONP AND, AUTOSCALE , BS 

INITIAL, MPLOT, MSGS, 

VS, MULTIPLOT, SETAAP, 
TICMARK 

NORMPGRP NORMP AND , AUTOSCALE , BS_ 

INITIAL, MPLOT, MSGS, 

VS, MULTIPLOT, SETAAP, 
TICMARK 



DESCGRP (Descriptive group) DESCR I BE , H I STHOW 

HISTHOW,HISTLIST- 
HOW , H I ST JACKHOW , 
EXPONPHOW.NORMPHOW 

VARIGRP (Variable group) TELDAT1 ,TELDAT2 , 

YROVR 



C. ROUTINE LISTING 

The above mentioned routines were either created by the 
author, adapted from existing fortran routine HISTG/F , or 
borrowed from the current APL library to supplement the 



7 4 



author created routines. 



1 . Author Created Rout i n e s 



HISTLIST, HISTS, HISTJACK, EXPONP, NORMP, APLOT, 
AUTOS, OUT, TOT 

2 • Adapted from Fortran Library Routine HISTG/F 
HIST, ECDF 

3 . Borrowed Routines to Supplement Author Created 
Rou ti nes 

AND, APLNAME , AUTOSCALE, CMS, DFT , ECODE, EFT, 
INITIAL, MPLOT , MSGS, MULTIPLOT, NDTRI , OF, SETAAP, TICMARK, 
VS, WRITE 
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X. COMPUTER LISTING OF ALL ROUTINES 
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[39] TABJ-.-*( Z*Q )/TAB2A 

[40] X+-{ (Xi4[2] )aUs/I[ 3]))/* 

[41] TAB2A :C[ 12 ]*- ( C[ 2 7 ]<-X[ pX ] )-C[2 1 ]<-( X«-X[ i,X ] )[ 1 ] 

[42] *(-4[l]*OA/|[2]*OA/l[3]*0)/7\4fll/l 
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